For some months now, Oracle has been providing the public with the first release candidate of GraalVM, a new virtual machine (at the time this article is being written, RC10 is the latest version). When a software refers to the Holy Grail with its name, that raises the bar by a few inches. Let’s refresh our memory: The Holy Grail promises its owner nothing less than vitality, youth, food in abundance and overall happiness, and all this in a state of quasi-immortality. So the feature list of the GraalVM, which emerged from a series of research projects at the Institute for Software Systems of the Johannes Kepler University in Linz in close cooperation with the Oracle Labs, is correspondingly long.
It is not without reason that the name only holds the abbreviation VM and not JVM, because on a different architecture it can not only execute Java bytecode, but it becomes a real polyglot virtual machine, on which any programming language can basically be executed. Therefore, the GraalVM represents something new here. Even though the term polyglot has long been used for JVM in reference to languages such as Groovy, Kotlin or even JRuby, JVM itself is not polyglot: On its own, JVM can only execute bytecode. The fact that there are compilers that can generate bytecode for JVM for a given language has nothing to do with the JVM, but only allows us developers to develop for JVM using other languages. GraalVM, on the other hand, leaves such limits behind and is designed to be a universal virtual machine. As such, it supports all languages that have a compiler which generates bytecode; this means that anybody working with Java, Scala, Clojure and company can already use GraalVM. It currently supports Java up to version 8; Java 11 and up will follow later. In addition to these languages for JVM, programs written in the statistical language R, Python, Ruby, JavaScript or LLVM bitcode can also be executed by GraalVM.
STAY TUNED!
Learn more about JAX London
But why does it need a new virtual machine at all, if there are others available? For sure, there is no single reason and – as in the case of some projects – there was no big plan at the beginning. More than likely, opportunities and needs flowed together at the right time. In recent years, many new languages have appeared on the software development stage. And I’m not referring to exotic languages or projects without practical relevance for day-to-day work. Rather, it’s about languages like Scala or Kotlin that follow other paradigms, or languages like Go and Rust, developed by organizations primarily for their own needs. The motivation for many JVM-based languages was to be better than Java in certain areas in terms of language features, and to simplify things. In other areas, Java has never been a real alternative until now. Be it because installing the Java Runtime Environment (JRE) together with the application is often too time-consuming, or be it because the start-up time of Java Virtual Machine is impractical for the particular application background. But also trends in the software architecture, such as serverless computing or fast scaling cloud architectures, which do not rely on maximum throughput but rather on fast start times, put Java under pressure. Projects like Kotlin / Native confirm this as well as the success of Go in system development.
Oracle may have laid the groundwork for the next twenty years of software development with Java, which probably contributes more to the future of Java than some of the language features that people like to talk about passionately.
The development of Java was rather sluggish than not until Java 9, and the pressure for innovation is high, as I had just indicated. Developing Java as a widely used language is not easy, as language changes may require changes to other parts of the JDK, such as the JVM. Here is where HotSpot has a problem: HotSpot is written in C ++, it is complex and has a very old code base. Although C ++ has also evolved, modernizing an existing code base is time-consuming. In addition, C ++ is being taught less and less at universities. This makes it harder to find new blood to work on HotSpot.
Subscription and installation
Oracle provides two variants of the GraalVM: the Community Edition, which can be used free of charge, and the paid Enterprise Edition. Both versions are currently only available for macOS and Linux. As could have been expected, certain features are reserved for Enterprise Edition. For example, Native Binarys created with the Enterprise Edition can be heaped and accessed for DWARF debugging information. In addition, the Enterprise Edition can make better optimizations than the Community Edition. Community Edition is available through the project website, while Enterprise Edition through the Oracle Technology Network. Both editions can be downloaded as a .tar.gz file. For installation, it is therefore sufficient to unpack the respective archive and set the JAVA_HOME variable on the unpacked directory.
Review
HotSpot JVM was introduced to the then very young Java world in April 1999 with Java 1.3. Java 1.0 had been released three years earlier, but was not able to handle neither internal classes nor reflection. The Collections Framework wasn’t added until Java 1.2. This version was also the first version with a just-in-time compiler (JIT) for bytecode that was previously executed only as interpreted. That was a good twenty years ago, so probably a few of our readers were not even around at this time. At the time, application development was dominated by C and C ++. Many business applications were also developed with Borlands Delphi, and the Internet was slowly being noticed by a broader group.
Given the dominance of C and C++, most fledging Java programmers were previously C / C ++ programmers, who found the switch to Java easy because of the great similarity of the languages. Due to their wide distribution and the high speed of C and C ++, HotSpot JVM was written in C ++ with assembler parts. The Java code written at that time was also quite different from the code that would be written today. In places where streams would be applied today, the use of for loops prevailed in the day, and when performance became critical, the approach was to avoid the creation of new objects or cache them. The introduction of JVM HotSpot brought significant improvement here thanks to the just-in-time compilation. In particular, dynamic optimizations performed by Hotspot such as inlining, dead code elimination or loop unrolling contributed significantly to better performance. Java code with up-to-date features such as autoboxing, lambdas, and streams on the one hand enhances both the elegance and expressivity of the code, but on the other hand, it is also slower than classic constructs such as the for loop. The reason for this is that often many short-lived objects are created in the background, which then have to be cleared away again. The fact that the garbage collectors have become more efficient and the creation of objects is becoming cheaper and cheaper does not solve this problem either.
Thus, more important are techniques that pay more attention to optimizing currently generated bytecode or, ideally, that avoid the creation of new objects in the heap altogether. The basis for this is the so-called escape analysis, for example. Here, the possible execution paths are examined by a section of code to check if generated objects can leave the current scope. An example of this is adding a new object to a Map passed on as a parameter or returning it as a return value. In these cases, the object may continue to exist and be used after the completion of a generating method. If this can be ruled out by static code analysis, the object method can be eliminated in the examined method and we can work with variables instead of objects. At best, they would be variables for primitive types that would only be on the stack. But even if the variables were references to objects on the heap, such optimization would benefit garbage collection, since the structures to be analyzed would be simpler.
The Architecture of Graal
Before we can turn our attention to the GraalVM internals, it’s important to revisit the Java Development Kit (JDK) and JVM. The JDK includes all the components needed to develop and run Java programs. For development, the JDK provides tools such as the Java compiler javac and other tools. However, execution is handled by the Java Runtime Environment (JRE), which is available either standalone or as part of the JDK. Sitting at the heart of the JRE is the HotSpot JVM (java).
The JVM (Fig. 1) is an implementation of the Java Virtual Machine Specification and consists of several components. The class loader subsystem is responsible for loading, verifying and linking class files and then initializing static fields and executing static code blocks. Simply put, the runtime data areas contain all data necessary for execution, such as the heap or stack of the current thread. The execution engine provides the interpreter, the JIT compiler, and the garbage collector. In the context of GraalVM, the JIT compiler is of particular interest. Java code is initially executed only by the interpreter in the execution engine.
This means that the Java bytecode is translated one-to-one into machine code and executed without any optimization. If sufficient information has been gathered from the profiler during execution and a method has been executed sufficiently often, the work of the JIT compiler begins. Based on the collected profiling data, the JIT compiler can decide how to optimize the method before translating it into machine code. The HotSpot JVM includes two JIT compilers: C1, a fast and only lightly optimizing compiler originally designed for desktop applications, and C2, an aggressively optimizing compiler for server applications that demand the highest throughput.
For a long time, these two compilers of HotSpot JVM were not interchangeable. In 2014, the Java Enhancement Proposal 243 suggested the creation of a Java Virtual Machine Compiler Interface (JVMCI) to be able to use other compilers, even those written in Java itself. In addition to the pure modularization, both the performance of Java was demonstrated and a broader basis for development work on JVM was created.
Unlike the name suggests, the heart of GraalVM is not an internal virtual machine, but rather highly optimizing Graal compiler, written directly in Java. The GraalVM uses OpenJDK as a basis, in which it integrates the Graal compiler via the JVMCI, which replaces C2 there. The name GraalVM was probably chosen for marketing reasons, so it would be better to call it ‘OpenJDK / Oracle JDK with Graal Compiler and additional tooling’.
For a compiler to support different languages, it must work with a language-independent intermediate representation between the source language and the machine code to be generated. In the case of Graal, the format chosen for this intermediate presentation was a graph. The main advantage of a graph is that similar statements of different languages can be represented in the same way. Foreach loops in Python and Java can be represented in the same graph, as well as an if statement in almost every language. This language-independent representation also makes it possible to use several languages in the same program. To treat them as a program from the perspective of the compiler, the only thing you need to do is create a common graph from them. On this graph, you can do language-independent optimization and generate machine code.
The Graal compiler is only one component of GraalVM, even if it is the central one. It also provides the LLVM bit-code interpreter Sulong, which makes it possible to run any language for which there is an LLVM frontend with GraalVM. SubstrateVM allows you to generate native binaries from Java programs via Ahead-of-Time (AOT) compilation. The Truffle Framework, which is also written in Java and is the basis for supporting other languages, provides an API that can be used to build interpreters for programming languages, which can then be executed using GraalVM. Thanks to the execution by GraalVM, the supported languages also benefit from the optimization possibilities of the Graal compiler. Figure 2 shows the interaction of the individual components.
Performance in comparison
To meet the promise to always achieve a high execution speed independent of the actual source language, the optimization possibilities of the Graal compiler are crucial. To compare the achievable performance characteristics of GraalVM with those of other JVMs, we use the Top Ten program by Chris Seaton. It calculates the ten most frequent words from Java’s streaming API based on an approximately 144 MB text file and is rewritten for measurement in a JMH benchmark (Listing 1).
Listing 1:
public class TopTenBenchmark { private Stream<String> fileLines(String path) { try { return Files.lines(Paths.get(path)); } catch (IOException e) { throw new RuntimeException(e); } } @BenchmarkMode(Mode.SampleTime) @Benchmark public void topten(Blackhole blackhole) { Arrays.stream(new String[]{"large.txt"} .flatMap(this::fileLines) .flatMap(line -> Arrays.stream(line.split("\\b"))) .map(word -> word.replaceAll("[^a-zA-Z]", "")) .filter(word -> word.length() > 0) .map(String::toLowerCase) .collect(Collectors.groupingBy(Function.identity(), Collectors.counting())) .entrySet().stream() .sorted((a, b) -> -a.getValue().compareTo(b.getValue())) .limit(10) .forEach(e -> blackhole.consume(format("%s = %d%n", e.getKey(), e.getValue()))); } }
To execute the benchmark, the results of which are summarized in Table 1 for the different JDK versions, a MacBook Pro 2017 with a 2.5 GHz Intel Core i7 was used.
Table 1
Execution time of the top ten benchmarks for different JDKs:
JDK | Time in seconds |
---|---|
Oracle JDK 8u192 | 15.801 |
OpenJDK 8u192 | 15.250 |
GraalVM 1.0 CE RC 10 | 13.814 |
GraalVM 1.0 EE RC 10 | 9.867 |
OpenJDK 11 (Open 9) | 26.268 |
OpenJDK 11 | 16.920 |
Oracle JDK 11 | 17.102 |
For each JDK compared, the benchmark was performed with five warm-up runs and five measurement runs. In the case of GraalVM, we have to keep in mind that the JVM itself has to translate the Graal compiler written in Java. For the example chosen, both GraalVM editions achieve the best execution performance. The Community Edition takes about fourteen seconds, while the Enterprise Edition beats this record by four seconds with a result of ten seconds. Last is OpenJDK 11 with Open9 JVM, with eighteen seconds more than the best result. These values cannot and should not be generalized, but they nevertheless show the order of magnitude of the individual virtual machines. The fact that the Enterprise Edition of GraalVM is significantly faster than the Community Edition shows that Oracle consciously draws a dividing line between what is free and what is payable.
Creating native binaries
The desire to be able to run Java programs as a real native binary of the operating system without going through the Java Virtual Machine has been around for long. This would make distributing and installing Java programs easier, eliminating the need for an additional JRE installation. .jar files from required libraries would not need to be distributed and the appropriate classpath would not need to be determined for execution either.
Furthermore, a native binary would have the advantage of starting faster, because it has already been translated into machine code form before execution (AOT compilation) and does not have to be translated into runtime by the JIT compiler. Thus, the performance behavior of native binaries is constant and predictable, because the code is no longer optimized at runtime by the JIT compiler in interaction with the profiler. Native binaries also need much less memory because the JVM infrastructure for the JIT compiler does not need to run.
Oracle has been working at improving the start-up time in several JDK versions through techniques such as CDS (Class Data Sharing), AppCDS (Application Class Data Sharing) and AOT compiler jaotc,unofficially introduced with Java 9,which even uses an earlier version of the Graal compiler. However, each of these techniques still required an installed JRE, and although measurable improvements in start-up speed were found, they were not significant.
For the creation of native binaries, GraalVM comes with the native-image program, which can be used to translate individual .jar or .class files into machine code. The challenge with Java programs, as with all JDK-based languages, is the ability to reload classes at runtime. This powerful feature is also often used to determine at runtime which frameworks can be used. For example, a program could test which logging framework is in the class path at runtime and configure its logging depending on the result of this search. The most common example of this kind of runtime configuration is the Spring Framework. Dynamic behavior like this is a big challenge for Ahead-of-Time compilation.
native-image, which also uses the Graal compiler internally for translation, therefore analyzes all possible paths that the program can go through for a given classpath during translation. This analysis results in a significantly longer translation time when compared to javac or the LLVM compiler clang. But, you reliably determine which classes are needed and which are not. Programs generated using native-image are therefore orders of magnitude which are smaller (usually only a few megabytes) than the sum of JDK/JRE and required libraries. Usually, Java programmers do not have to worry about thread and memory management, because the JVM automatically does that for them. This also applies in the case of native binaries. This is ensured by SubstrateVM, which is written in Java itself and compiled by native-image into the generated binary.
Yet there are also special features to consider when creating native binaries. Often, Java uses static initialization, which is the ability to create static fields directly when loading the class through a direct assignment or in a static block. By default, native-image executes static initializations during translation, which causes the generated programs to run with exactly the same values each time they are executed. The start-up time can be improved that way.
If only lists with constants are filled during static initialization, this issue is not critical. However though, if run-time-dependent values such as a date are assigned, the program will always work exactly with this date value.
Listing 2:
public class DaysUntilChristmas { private static LocalDateTime now = LocalDateTime.now(); public static void main(String[] args) { LocalDateTime christmas = LocalDateTime.of(now.getYear(), 12, 1, 0, 0); Duration timeLeft = Duration.between(now, christmas); System.out.println("Christmas is in " + timeLeft); } }
Listing 3:
> export JAVA_HOME=/opt/graal/graalvm-ce-1.0.0-rc10/Contents/Home/ > mkdir out > $JAVA_HOME/bin/javac -d out src/main/java/net/sweblog/playground/graal/weihnachten/DaysUntilChristmas.java > $JAVA_HOME/bin/native-image --no-server -H:Name=duc -cp out net.sweblog.playground.graal.weihnachten.DaysUntilChristmas [duc:41559] classlist: 1,426.77 ms [duc:41559] (cap): 855.38 ms [duc:41559] setup: 2,057.42 ms [duc:41559] (typeflow): 2,604.42 ms [duc:41559] (objects): 617.27 ms [duc:41559] (features): 109.90 ms [duc:41559] analysis: 3,389.01 ms [duc:41559] universe: 165.76 ms [duc:41559] (parse): 588.47 ms [duc:41559] (inline): 1,256.12 ms [duc:41559] (compile): 5,068.60 ms [duc:41559] compile: 7,150.38 ms [duc:41559] image: 401.07 ms [duc:41559] write: 154.97 ms [duc:41559] [total]: 14,988.09 ms > ./duc && sleep 5 && ./duc Christmas is in PT7516H23M23.698S Christmas is in PT7516H23M23.698S
Listing 2 shows an sample program that calculates the time remaining until Christmas. Listing 3 shows how this program is translated and executed twice. The fact that the now field has the same time stamp with each execution is quite visible – the time remaining is always the same.
Another Java feature poses a problem for ahead-of-time translation: Reflection. native-image tries as best as possible to automatically determine the result of the respective reflection API used when analyzing the bytecode to be translated. This is possible if the result is constant, for example if based on a call of Class.forName (String className) during static analysis, we can determine that the value of className is always the same, therefore constant. In this case, for example, the statement may be rewritten as if newInstance () would be called directly for the particular class. Any use of Reflection which does not yield an accurately predictable result is an error.
But for each of these limiting points there is also a solution. If static initialization of values cannot or should not be avoided, native-image has the –delay-class initialization-to-runtime option, which can be used to specify a comma-separated list of classes, the initialization of which should not be performed until runtime. If GraalVM becomes more widespread, the static initialization of run-time-dependent fields is likely to be considered a bad practice in the future.
Likewise, any information necessary for the correct support of Reflection can be specified either via configuration files or by programming using the API available for SubstrateVM and GraalVM. This API can also be used to adapt third-party applications such as Netty or Tomcat so that native binaries can be generated from them.
Who should be interested in native binaries? Anyone who can do without Java’s write-once-run-anywhere aspirations or where fast start-up time is more important than maximum throughput. Achieving this is only possible through the combination of runtime profiling and JIT compiler. Listing 4 shows how big the difference between AOT and JIT can be. Here is the execution time of Top Ten as a normal program in a single execution using GraalVM and Native Binary as seen from the command line.
Listing 4:
> /usr/bin/time $JAVA_HOME/bin/java -cp target/classes net.sweblog.playground.graal.TopTen 1>&- 16.25 real 22.21 user 0.67 sys > /usr/bin/time ./TopTen 1>&- 36.25 real 35.99 user 0.18 sys
As a native binary, Top Ten runs more than twenty seconds longer than when executing using GraalVM. The fact that the values for real and user are also close to each other suggests that multiple threads were not used in the execution. On the other hand, a Hello World running as a native binary takes no longer than its C counterpart. Native binaries are finally opening up the possibility of using Java in system development – an area for which Java has never been considered before. Here, the effective running time of Java has up till now never been in any reasonable balance to the start-up time of JVM.
Conclusion
My goal was to provide a general introduction to GraalVM for Java developers and present the solution’s performance characteristics and the ability to generate native binaries. Other features such as the execution of Python or Ruby as well as polyglot development would have been beyond the scope of this article and are a separate topic for discussion.
Even if GraalVM is used just like a normal JVM, it pushes other JVMs to the back ranks in this application scenario alone. Together with the possibility to use them on the basis of the Truffle Framework as a platform for projects with the languages already mentioned here several times, or even implement domain specific languages with relatively little effort and be able to develop polyglot applications, GraalVM represents a breakthrough, and we are not yet able to assess its effect on software development. Oracle may have laid the groundwork for the next twenty years of software development with Java, which probably contributes more to the future of Java than some of the language features that people like to talk about passionately. In the end, all that remains is the desire for an early final release and continuous development of the solution.